weight regularization
- Asia > Taiwan (0.06)
- North America > United States (0.05)
- North America > Canada (0.05)
- South America > Peru > Lima Department > Lima Province > Lima (0.04)
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
Natural continual learning: success is a journey, not (just) a destination
Biological agents are known to learn many different tasks over the course of their lives, and to be able to revisit previous tasks and behaviors with little to no loss in performance. In contrast, artificial agents are prone to'catastrophic forgetting' whereby performance on previous tasks deteriorates rapidly as new ones are acquired. This shortcoming has recently been addressed using methods that encourage parameters to stay close to those used for previous tasks. This can be done by (i) using specific parameter regularizers that map out suitable destinations in parameter space, or (ii) guiding the optimization journey by projecting gradients into subspaces that do not interfere with previous tasks.
- South America > Peru > Lima Department > Lima Province > Lima (0.04)
- North America > United States > Missouri > St. Louis County > St. Louis (0.04)
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
- Oceania > Australia > New South Wales > Sydney (0.05)
- North America > United States > Washington > Benton County > Richland (0.05)
- North America > United States > Tennessee > Anderson County > Oak Ridge (0.05)
Model Guidance via Robust Feature Attribution
Ghitu, Mihnea, Piratla, Vihari, Wicker, Matthew
Controlling the patterns a model learns is essential to preventing reliance on irrelevant or misleading features. Such reliance on irrelevant features, often called shortcut features, has been observed across domains, including medical imaging and natural language processing, where it may lead to real-world harms. A common mitigation strategy leverages annotations (provided by humans or machines) indicating which features are relevant or irrelevant. These annotations are compared to model explanations, typically in the form of feature salience, and used to guide the loss function during training. Unfortunately, recent works have demonstrated that feature salience methods are unreliable and therefore offer a poor signal to optimize. In this work, we propose a simplified objective that simultaneously optimizes for explanation robustness and mitigation of shortcut learning. Unlike prior objectives with similar aims, we demonstrate theoretically why our approach ought to be more effective. Across a comprehensive series of experiments, we show that our approach consistently reduces test-time misclassifications by 20% compared to state-of-the-art methods. We also extend prior experimental settings to include natural language processing tasks. Additionally, we conduct novel ablations that yield practical insights, including the relative importance of annotation quality over quantity. Code for our method and experiments is available at: https://github.com/Mihneaghitu/ModelGuidanceViaRobustFeatureAttribution.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Netherlands > North Brabant > 's-Hertogenbosch (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
- Health & Medicine > Therapeutic Area (0.93)
- Health & Medicine > Diagnostic Medicine > Imaging (0.87)
Yet Unnoticed in LSTM: Binary Tree Based Input Reordering, Weight Regularization, and Gate Nonlinearization
LSTM models used in current Machine Learning literature and applications, has a promising solution for permitting long term information using gating mechanisms that forget and reduce effect of current input information. However, even with this pipeline, they do not optimally focus on specific old index or long-term information. This paper elaborates upon input reordering approaches to prioritize certain input indices. Moreover, no LSTM based approach is found in the literature that examines weight normalization while choosing the right weight and exponent of Lp norms through main supervised loss function. In this paper, we find out which norm best finds relationship between weights to either smooth or sparsify them. Lastly, gates, as weighted representations of inputs and states, which control reduction-extent of current input versus previous inputs (~ state), are not nonlinearized enough (through a small FFNN). As analogous to attention mechanisms, gates easily filter current information to bold (emphasize on) past inputs. Nonlinearized gates can more easily tune up to peculiar nonlinearities of specific input in the past. This type of nonlinearization is not proposed in the literature, to the best of author's knowledge. The proposed approaches are implemented and compared with a simple LSTM to understand their performance in text classification tasks. The results show they improve accuracy of LSTM.